Language Models

نویسنده

  • Martin Kay
چکیده

In statistically based natural language processing a language model is used in the generation of output strings to assess likelihood that a given string of words is a sentence in a particular language. If the information about the ordering of the words in the sentences in a texts were discarded so that each sentence was regarded simply as a bag of words, then the better of two language models would be able to do a better job of restoring their order. According to the simplest view of statistical machine translation, this is just what the language model is called upon to do. For each source sentence, a translation model proposes a bag of words to be used in its translation and it is the job of the language model to order them in the best way. There is, in fact, more to it than this, but it at least gives a setting for the use of language models. Language models are based on so-called n-grams. An n-gram is simply a sequence of n words. The likelihood that a permutation of the given bag of words is a sentence of the language is estimated by taking the product of the probabilities of all the n-grams that it contains. Thus, a sentence of k words contains k n+ 1 n-grams, and the probability of each of them is estimated on the basis of a body of training data. Various tricks are used to allow for permutations that did not occur at all in the training data. The value of n is xed at a fairly low value|say 3, 4, or 5|because, though larger values generally give better results, the amount of training data required to permit a useful estimate of the probability of longer strings grows at an overwhelming rate with value of n. This note results from my re ections on the possibility of language models based on sequences of variable size. If a given set of training data of fairly modest size contains some three or four instances of strings consisting of eight or ten words, then this is surely a remarkable fact, and one from which it should be possible to derive important advantages. Observe, rst of all, that it is a relatively straightforward matter to catalog the repeated sequences of whatever length in a corpus of text. They

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Iranian In-service and Pre-service Language Teachers’ Perceptions of Educational Supervision Concerning their Professional Development

Teacher supervision plays a pivotal role in the improvement of education system and the way in which teachers and student teachers perceive it. Consequently language teacher supervisors can utilize appropriate supervisory models to keep teachers update and promote them professionally. The present study investigated the role of language teacher supervisors in student teachers and in-service teac...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

The musical language Elements of Persian musical language: modes, rhythm and syntax

In treating the subject of musical language, a Persian musician would be intrinsically drawn to the structural similarities between the Persian music and language. Indeed Persian music and language are extremely related in their metrics, intonations and structural phrases (syntax). Although we will draw upon this relationship, our aim in this article is to present “music as a language,” c...

متن کامل

Non-Verbal Communication in Models of Communicative Competence and L2 Teachers’ Rating

Non-verbal communication (NVC) plays a major role in various aspects of human life (Andersen, 2004; Cameron, 2001; Johnstone, 2008). Children learning their first language come to realize non-verbal communication as their socialization process takes place (Fletcher & German, 1990; Ingram, 1996; Owens, 2001). However, most EFL learners may have little exposure to these non-verbal aspects of comm...

متن کامل

Modeling and Evaluation of Stochastic Discrete-Event Systems with RayLang Formalism

In recent years, formal methods have been used as an important tool for performance evaluation and verification of a wide range of systems. In the view points of engineers and practitioners, however, there are still some major difficulties in using formal methods. In this paper, we introduce a new formal modeling language to fill the gaps between object-oriented programming languages (OOPLs) us...

متن کامل

Modeling and Evaluation of Stochastic Discrete-Event Systems with RayLang Formalism

In recent years, formal methods have been used as an important tool for performance evaluation and verification of a wide range of systems. In the view points of engineers and practitioners, however, there are still some major difficulties in using formal methods. In this paper, we introduce a new formal modeling language to fill the gaps between object-oriented programming languages (OOPLs) us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006